ScalaTrace: Tracing, Analysis and Modeling of HPC Codes at Scale

نویسندگان

Frank Mueller

Xing Wu

Martin Schulz

Bronis R. de Supinski

Todd Gamblin

چکیده

Characterizing the communication behavior of large-scale applications is a difficult and costly task due to code/system complexity and their long execution times. An alternative to running actual codes is to gather their communication traces and then replay them, which facilitates application tuning and future procurements. While past approaches lacked lossless scalable trace collection, we contribute an approach that provides orders of magnitude smaller, if not near constant-size, communication traces regardless of the number of nodes while preserving structural information. We introduce intraand inter-node compression techniques of MPI events, we develop a scheme to preserve time and causality of communication events, and we present results of our implementation for BlueGene/L. Given this novel capability, we discuss its impact on communication tuning and on trace extrapolation. To the best of our knowledge, such a concise representation of MPI traces in a scalable manner combined with time-preserving deterministic MPI call replay are without any precedence.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Tools for Simulation and Benchmark Generation at Exascale

The path to exascale high-performance computing (HPC) poses several challenges related to power, performance, resilience, productivity, programmability, data movement, and data management. Investigating the performance of parallel applications at scale on future architectures and the performance impact of different architecture choices is an important component of HPC hardware/software co-desig...

متن کامل

Scalable communication event tracing via clustering

Communication traces help developers of high-performance computing (HPC) applications understand and improve their codes. When run on large-scale HPC facilities, the scalability of tracing tools becomes a challenge. To address this problem, traces can be clustered into groups of processes that exhibit similar behavior. Instead of collecting trace information of each individual node, it then suf...

متن کامل

High Performance Computing in Hydro- and Environmental Engineering

High Performance Computing (HPC) can be understood as the interaction of parallel and adaptive methods with fast solvers on powerful parallel computers. Therefore, an introduction to parallel and adaptive methods as well as to fast solvers is given. The interaction of these methods is demonstrated using three examples which deal with groundwater flow and transport processes, gaswater flow as we...

متن کامل

hpsgprof: A New Profiling Tool for Large–Scale Parallel Scientific Codes

Contemporary High Performance Computing (HPC) applications can exhibit unacceptably high overheads when existing instrumentation–based performance analysis tools are applied. Our experience shows that for some sections of these codes, existing instrumentation–based tools can cause, on average, a fivefold increase in runtime. Our experience has been that, in a performance modelling context, thes...

متن کامل

A Model for evaluating the fire resistance of high performance concrete columns

A numerical model, in the form of a computer program, for evaluating the fire resistance of high performance concrete (HPC) columns is presented. The three stages, associated with the thermal and structural analysis, for the calculation of fire resistance of columns is explained. A simplified approach is proposed to account for spalling under fire conditions. The use of the computer program for...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2010

ScalaTrace: Tracing, Analysis and Modeling of HPC Codes at Scale

نویسندگان

چکیده

منابع مشابه

Tools for Simulation and Benchmark Generation at Exascale

Scalable communication event tracing via clustering

High Performance Computing in Hydro- and Environmental Engineering

hpsgprof: A New Profiling Tool for Large–Scale Parallel Scientific Codes

A Model for evaluating the fire resistance of high performance concrete columns

عنوان ژورنال:

اشتراک گذاری